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Abstract 


This  report  describes  an  algorithm  for  detecting  military  vehicles  in  FLIR 
imagery  that  will  be  used  as  a  prescreener  to  eliminate  large  areas  of  the 
image  from  further  analysis.  The  output  is  a  list  of  likely  target  locations 
with  confidence  numbers  to  be  sent  to  a  more  complex  clutter-rejection  al¬ 
gorithm  for  analysis.  The  algorithm  uses  simple  features  and  is  intended  to 
be  applicable  to  a  wide  variety  of  target-sensor  geometries,  sensor  configu¬ 
rations,  and  applications. 
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1.  Introduction 


We  designed  the  algorithm  described  in  this  report  to  address  the  need  for 
a  detection  algorithm  that  could  serve  as  a  prescreener/ detector  for  a  broad 
number  of  applications.  While  most  automatic  target  detection/ recognition 
(ATD/R)  algorithms  use  much  problem-specific  knowledge  to  improve  per¬ 
formance,  the  result  is  an  algorithm  that  is  tailored  to  specific  target  types 
and  poses.  The  approximate  range  to  target  is  often  required,  with  varying 
amounts  of  tolerance.  For  example,  in  some  scenarios,  it  is  assumed  that  the 
range  is  known  to  within  one  meter  from  a  laser  range  finder  or  a  digital 
map.  In  other  scenarios,  only  the  range  to  the  center  of  the  field  of  view  and 
the  depression  angle  is  known,  so  that  a  flat-earth  approximation  provides 
the  best  estimate.  Many  algorithms,  both  model-based  and  learning-based, 
required  either  accurate  range  information  or  compensate  for  inaccurate 
information  by  attempting  to  detect  targets  at  a  number  of  different  ranges 
within  the  tolerance  of  the  range.  Because  many  such  algorithms  are  quite 
sensitive  to  scale,  even  a  modest  range  tolerance  requires  that  the  algo¬ 
rithm  attempt  to  match  at  a  large  number  of  closely  spaced  scales,  driving 
up  both  the  computational  complexity  and  the  false  alarm  rate.  Algorithms 
have  often  used  view-based  neural  networks  [1-3]  or  statistical  methods 
[4]. 

The  proximate  motivation  for  developing  the  scale-insensitive  algorithm 
was  to  provide  a  fast  prescreener  for  a  robotic  application  for  which  no 
range  information  was  available.  Instead,  the  algorithm  attempted  to  find 
targets  at  all  ranges  between  some  reasonable  minimum,  determined  from 
operational  requirements,  and  the  maximum  effective  range  of  the  sensor. 

Another  motivation  was  to  develop  an  algorithm  that  could  be  applied  to 
a  wide  variety  of  image  sets  and  sensor  types  without  the  severe  degrada¬ 
tion  in  performance  that  commonly  occurs  with  learning  algorithms,  such 
as  neural  networks  and  principal  component  analysis-based  methods,  that 
have  been  trained  on  a  limited  variety  of  sensor  types,  terrain  types,  and 
environmental  conditions.  While  we  recognize  that  with  a  suitable  training 
set,  learning  algorithms  will  often  perform  better  than  other  methods,  such 
a  scenario  typically  requires  a  large  and  expensive  training  set,  which  is 
sometimes  not  feasible. 
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The  dataset  used  in  training  and  testing  this  system  was  the  April  1992 
Comanche  forward  looking  infrared  (FLIR)  collection  at  Ft.  Hunter  Liggett, 
CA.  This  dataset  consists  of  1225  images,  each  of  which  is  720  by  480  pixels. 
Each  image  has  a  field  of  view  of  approximately  1.75  degrees  squared. 

Each  image  contains  one  or  two  targets  in  a  hilly  wooded  background. 
Ground  truth  was  available  that  provided  target  centroid,  range  to  target, 
target  type,  target  aspect,  range  to  center  of  field  of  view,  and  depression 
angle.  The  target  centroid  and  range  to  target  were  used  to  score  the  al¬ 
gorithm,  as  described  in  the  experimental  results  section,  but  none  of  the 
target-specific  information  was  used  in  the  testing  process.  The  algorithm 
assumes  that  only  the  vertical  and  horizontal  fields  of  view  and  the  pixel 
geometry  are  known.  The  only  range  information  used  is  the  operational 
minimum  range  and  the  maximum  effective  range  of  the  sensor. 


3.  Features 


Each  feature  is  calculated  for  every  pixel  in  the  image.  As  more  complex 
features  are  added  in  the  future,  it  might  become  beneficial  to  calculate 
some  of  the  features  only  at  those  locations  for  which  the  other  feature  val¬ 
ues  are  high.  While  each  feature  assumes  knowledge  of  the  range  to  deter¬ 
mine  approximate  target  size,  these  features  are  not  highly  range  sensitive. 
The  algorithm  calculates  each  feature  at  coarsely  sampled  ranges  between 
the  minimum  and  maximum  allowed  range. 

Each  feature  described  below  was  chosen  based  on  intuition,  with  the  cri¬ 
teria  that  they  be  monotonic  and  computationally  simple.  The  features  are 
described  in  decreasing  order  of  importance. 

3.1  Maximum  Grey  Level,  Feature  0 

The  maximum  grey  level  is  the  highest  grey  level  within  a  roughly  target¬ 
sized  rectangle  centered  on  the  pixel.  We  chose  it  because  in  many  FLIR 
images  of  vehicles,  a  few  pixels  are  significantly  hotter  than  the  rest  of  the 
target  or  the  background.  These  pixels  are  usually  on  the  engine,  the  ex¬ 
haust  manifold,  or  the  exhaust  pipe.  The  feature  is  calculated  as 

Ffj  =  rnaX(kJ)£Nin{uj)f(k,l),  (1) 

where  f(k.  I )  is  the  grey-level  value  of  the  pixel  in  the  fcth  row  and  Ith  col¬ 
umn;  Nin(i,j)  is  the  neighborhood  of  the  pixel  (i.j)  defined  as  a  rectangle 
whose  width  is  the  length  of  the  longest  vehicle  in  the  target  set  and  whose 
height  is  that  of  the  tallest  vehicle  in  the  target  set.  For  the  applications  we 
have  considered,  the  width  is  7  m  and  the  height,  3  m. 


3.2  Contrastbox,  Feature  1 


The  contrastbox  feature  measures  the  average  grey  level  over  a  target-sized 
region  and  compares  it  to  the  grey  level  of  the  local  background.  We  chose 
this  feature  because  many  pixels  that  are  not  on  the  engine  or  on  other  par¬ 
ticularly  hot  portions  of  the  target  are  still  somewhat  warmer  than  the  nat¬ 
ural  background.  This  feature  has  been  used  by  a  large  number  of  authors 
and  is  calculated  as 


Fl. 


j-  T,  /<*.  <>-;r-  £  /(M)- 

'm  ( k.l)€Nin(i.j )  out  (k.l)£Nout(i.j) 


(2) 


where  nout  is  the  number  of  pixels  in  Nout(i,j),  is  the  number  of  pixels 
in  Ni„(i,j),  and  Nin(i,j)  is  the  target-sized  neighborhood  defined  above. 
The  neighborhood  Nout(i,j )  contains  all  of  the  pixels  in  a  larger  rectangle 
around  (i.j),  except  those  pixels  in  Nin(i,j). 
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3.3  Average  Gradient  Strength,  Feature  2 

We  chose  the  gradient-strength  feature  because  manmade  objects  tend  to 
show  sharper  internal  detail  than  natural  objects  do,  even  when  the  aver¬ 
age  intensity  is  similar.  To  prevent  large  regions  of  background  that  show 
higher  than  normal  variation  from  showing  a  high  value  for  this  feature, 
we  subtract  the  average  gradient  strength  of  the  local  background  from  the 
average  gradient  strength  of  the  target-sized  region.  The  feature  is  calcu¬ 
lated  as 

Ffa  =  —  £  Gin(i,j)~—  £  Gout(i,j),  (3) 

n%U  (. k,l)€Nin(i,j )  nout  (k,l)€Nout(i,j) 


where 


Gin(i,j)  —  Gin(i,j)  +  Gin(i,j),  (4) 

al(i,j)=  Y.  l/(U)-/(u  +  i)i,  (5) 

Gvin{i,j)=  £  \f(i,j)~  f(i  +  l,j)\  >  (6) 

(■ i,j)eNin  . 

and  Gout{i,j )  is  defined  similarly. 

3.4  Local  Variation,  Feature  3 

The  local  variation  feature  is  calculated  as 

Ffj  =  —  £  Lin(ij)-J-  £  Lout(i,j),  (7) 

(k,l)€Nin(i,j)  n°ut  (k,i)eNout(ij) 

where 


Lin(i,j)=  £  \f{k,l)  -  Hin{Lj)\  (8) 

(k,l)€Nin(i,j) 


and 


W»(i,j)  =  —  £  (9) 

Hm  (k,l)eNin{i,j) 


3.5  How  Features  Were  Selected 

A  full  description  of  the  feature  selection  is  outside  the  scope  of  this  report. 
We  programmed  a  large  number  of  features  and  calculated  the  value  of 
these  features  over  a  large  number  of  randomly  selected  pixels  in  the  im¬ 
ages  of  the  training  set.  We  also  calculated  the  feature  values  at  the  ground 
truth  location  of  the  targets.  We  computed  histograms  for  each  feature  for 


4 


both  the  target  and  background  pixels  and  calculated  a  measure  of  separa¬ 
bility.  We  also  calculated  the  correlation  of  the  features  to  avoid  choosing 
several  features  that  are  similar.  Some  of  the  features  were  highly  corre¬ 
lated,  which  was  expected  because  one  of  the  purposes  of  the  training  was 
to  determine  which  of  similar  features  provided  the  greatest  separability. 
For  example,  a  number  of  contrast  features  were  used,  which  normalized 
the  target  and  background  values  by  local  standard  deviation  of  the  back¬ 
ground,  or  of  the  target,  or  neither.  Similarly,  a  number  of  gradient-strength 
features  were  calculated.  The  feature-pruning  process  was  ad  hoc;  thus  it 
would  be  reasonable  to  expect  that  performance  improvement  could  be  ob¬ 
tained  by  the  use  of  a  more  rigorous  approach. 


5 


4.  Combining  Features 


Each  feature  is  normalized  across  the  image  so  that  the  feature  value  at  each 
pixel  represents  the  number  of  standard  deviations  that  the  pixel  stands 
apart  from  the  values  for  the  same  feature  across  the  image.  Thus  the  fea¬ 
ture  image  for  the  mth  feature  is  normalized  as 


where 


pm,N  _ 


Fj!j  ftm 


l^m  — 


_1_ 

M 


E  m 

all(k,l) 


(10) 


(ID 


and 

<F»  -  ^  ■  (12) 

all(k,l ) 

After  normalization,  the  features,  each  of  which  is  calculated  for  each  pixel, 
are  linearly  combined  into  a  confidence  image, 

Gid  =  £  >  (I3) 

m= 0 

where  the  feature  weights  ujm  are  determined  with  the  use  of  an  algorithm 
not  described  here.  The  confidence  value  of  each  pixel  is  mapped  by  a  scal¬ 
ing  function  S  :  5ft  — ►  [0, 1],  as 

S{Gij)  =  1  -  eaGi’j  ,  (14) 

where  o  is  a  constant. 

This  scaling  does  not  change  the  relative  value  of  the  various  pixels;  it 
merely  scales  them  to  the  interval  [0, 1]  for  convenience.  Confidence  num¬ 
bers  are  often  limited  to  this  interval  because  they  are  estimates  of  the  a 
posteriori  probability.  While  this  is  not  true  for  our  algorithm,  the  use  of 
this  interval  is  convenient  for  evaluators. 

To  determine  the  detection  locations  from  the  scaled  confidence  image,  we 
choose  the  pixel  value  with  the  maximum  confidence  value.  Then  a  target¬ 
sized  neighborhood  around  the  image  is  set  to  zero  so  that  the  search  for 
subsequent  detections  will  not  choose  a  pixel  location  corresponding  to  the 
same  target.  The  process  is  repeated  an  integer  number  of  times,  where  the 
integer  is  chosen  a  priori. 
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5.  Experimental  Results 


The  training  results  on  the  Hunter  Liggett  April  1992  ROI  database  are 
shown  in  the  receiver  operating  characteristics  curve  in  figure  1.  Figure 
2  shows  test  results  on  the  February  1992  ROI  database  collected  at  Yuma 
Proving  Ground  (YPG),  and  figure  3  shows  test  results  on  the  Greyling 
August  1992  ROI  database.  The  Yuma  test  data  are  much  more  difficult 
because  they  were  taken  in  the  desert  in  July,  so  many  locations  in  the  im¬ 
age  have  a  higher  apparent  temperature  than  that  of  the  targets.  The  data 
from  Greyling,  Michigan  are  significantly  easier  because  the  temperatures 
are  milder,  and  the  data  are  comparable  in  difficulty  to  the  training  data. 
Note  that  no  training  data  were  used  from  anywhere  but  Hunter  Liggett, 
so  the  results  suggest  that  the  algorithm  is  not  sensitive  to  the  training 
background.  This  is  not  surprising  given  the  simplicity  of  the  algorithm. 
However,  learning  algorithms  are  often  sensitive  to  training  background. 
Figures  4  and  5  show  a  sample  image  and  the  results  of  the  algorithm  on 
the  image.  The  crosses  in  figure  5  denote  the  ground-truth  targets,  and  the 
x’s  denote  the  detections  on  the  targets.  Detections  are  designated  hits  if 
the  detection  center  falls  anywhere  on  the  actual  target;  otherwise,  they  are 
designated  as  false  alarms.  The  top  three  detections,  ranked  by  confidence 
number,  are  designated  on  the  image.  The  top  two  detections  are  hits,  while 
the  third  falls  near  the  target  and  is  designated  a  false  alarm.  Figures  6  and 
7  show  another  somewhat  more  difficult  image  and  associated  algorithm 
results.  The  top  detection  falls  on  a  target  in  the  bottom  left  of  the  image, 
while  the  second  highest  detection  is  a  false  alarm  near  the  center  of  the  im¬ 
age.  Although  the  location  looks  like  a  possible  target,  it  is  merely  a  warm 
spot  on  the  dirt  road. 

The  algorithm,  with  relatively  minor  modifications,  has  been  used  by  the 
Demo  III  unmanned  ground  vehicle  (UGV)  program  to  reduce  the  amount 
of  imagery  that  must  be  transmitted  via  radio  link  to  a  human  user.  It  will 
also  be  used  by  the  Sensors  for  UGV  program  at  the  Night  Vision  and  Elec¬ 
tronic  Sensors  Directorate  to  prescreen  uncooled  FLIR  imagery  and  to  indi¬ 
cate  potential  targets  that  should  be  looked  at  more  closely  with  an  active 
laser  sensor.  This  algorithm  has  been  used  as  a  synthetic  image-validation 
tool  by  measuring  the  performance  of  the  algorithm  in  comparison  to  real 
imagery. 
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Figure  3.  ROC  curve  on 
Greyling  August  1992 
imagery 


1.00 


Figure  4.  Easy  image 
from  Hunter  Liggett 
April  1992  dataset. 


Figure  5.  Results  on 
image  in  figure  4. 


Figure  6.  Moderate 
image  from  Hunter 
Liggett  April  1992 
dataset. 


Figure  7.  Results  on 
image  in  figure  6. 
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6.  Conclusions  and  Future  Work 


Future  work  might  include  a  more  systematic  evaluation  of  potential  fea¬ 
tures  and  an  improved  classification  scheme  that  allows  useful  features  that 
appear  rarely  to  be  incorporated.  In  a  small  minority  of  FLIR  images  of  tar¬ 
gets,  a  windshield  will  reflect  cold  sky  causing  a  few  pixels  to  be  extremely 
dark.  The  current  scheme  is  not  set  up  to  incorporate  such  features  because 
the  weighting  would  be  quite  low  since  the  feature  is  seldom  useful. 
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